AITopics | spatial redundancy

Collaborating Authors

spatial redundancy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5c20ca4b0b20b0bd2f1d839dc605e70f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 13:05:36 GMT

anchor, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Social Sector (0.68)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

c6e954799a0218f6d341ad5cbfb58999-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 20:49:15 GMT

Invideo recognition, weneedtosample multiple frames torepresent eachvideo which makesthe computational cost scale proportionally to the number of sampled frames. In most cases, a small proportion of all the frames is sampled for each input, which only contains limited information of the original video.

afnet, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

2d969e2cee8cfa07ce7ca0bb13c7a36d-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 01:44:52 GMT

encoder, transformer, vision transformer, (8 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangxi Province > Nanning (0.05)
Asia > China > Shanghai > Shanghai (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

Neural Information Processing SystemsDec-23-2025, 19:42:31 GMT

The accuracy of deep convolutional neural networks (CNNs) generally improves when fueled with high resolution images. However, this often comes at a high computational cost and high memory footprint. Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification by processing a sequence of relatively small inputs, which are strategically selected from the original image with reinforcement learning. Such a dynamic decision process naturally facilitates adaptive inference at test time, i.e., it can be terminated once the model is sufficiently confident about its prediction and thus avoids further redundant computation. Notably, our framework is general and flexible as it is compatible with most of the state-of-the-art light-weighted CNNs (such as MobileNets, EfficientNets and RegNets), which can be conveniently deployed as the backbone feature extractor. Experiments on ImageNet show that our method consistently improves the computational efficiency of a wide variety of deep models. For example, it further reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 20% without sacrificing accuracy.

dynamic approach, glance and focus, spatial redundancy, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

CompTrack: Information Bottleneck-Guided Low-Rank Dynamic Token Compression for Point Cloud Tracking

Zhou, Sifan, Cao, Yichao, Nie, Jiahao, Fu, Yuqian, Zhao, Ziyu, Lu, Xiaobo, Wang, Shuo

arXiv.org Artificial IntelligenceNov-25-2025

Despite great success having been achieved, the inherent sparsity of point clouds introduces a dual-redundancy challenge that limits existing trackers: (1) vast spatial redundancy from background noise impairs accuracy, and (2) informational redundancy within the foreground hinders efficiency. To tackle these issues, we propose CompTrack, an novel end-to-end framework that systematically eliminates both forms of redundancy in point clouds. First, CompTrack incorporates a Spatial F oreground Predictor (SFP) module to filter out irrelevant background noise based on information entropy, addressing spatial redundancy. Subsequently, its core is an Information Bottleneck-guided Dynamic T oken Compression (IB-DTC) module that eliminates the informational redundancy within the foreground. Theoretically grounded in low-rank approximation, this module leverages an online SVD analysis to adaptively compress the redundant foreground into a compact and highly informative set of proxy tokens.Extensive experiments on KITTI, nuScenes and Waymo datasets demonstrate that CompTrack achieves top-performing tracking performance with superior efficiency, running at a real-time 90 FPS on a single RTX 3090 GPU.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.1558

Country:

Asia (0.46)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model Y ufei Wang

Neural Information Processing SystemsOct-10-2025, 03:49:09 GMT

However, the large number of Gaussians and their associated attributes require effective compression techniques.

anchor, dataset, neural gaussian, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Social Sector (0.68)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Look More but Care Less in Video Recognition

Neural Information Processing SystemsAug-18-2025, 20:12:47 GMT

With this design, we can introduce more frames to the network but cost less computation.

artificial intelligence, machine learning, navigation module, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Learning Compact Vision Tokens for Efficient Large Multimodal Models

Tang, Hao, Shen, Chengchao

arXiv.org Artificial IntelligenceJun-10-2025

Large multimodal models (LMMs) suffer significant computational challenges due to the high cost of Large Language Models (LLMs) and the quadratic complexity of processing long vision token sequences. In this paper, we explore the spatial redundancy among vision tokens and shorten the length of vision token sequences for inference acceleration. Specifically, we propose a Spatial Token Fusion (STF) method to learn compact vision tokens for short vision token sequence, where spatial-adjacent tokens are fused into one. Meanwhile, weight-frozen vision encoder can not well adapt to the demand of extensive downstream vision-language tasks. To this end, we further introduce a Multi-Block Token Fusion (MBTF) module to supplement multi-granularity features for the reduced token sequence. Overall, we combine STF and MBTF module to balance token reduction and information preservation, thereby improving inference efficiency without sacrificing multimodal reasoning capabilities. Experimental results demonstrate that our method based on LLaVA-1.5 achieves comparable or even superior performance to the baseline on 8 popular vision-language benchmarks with only $25\%$ vision tokens of baseline. The source code and trained weights are available at https://github.com/visresearch/LLaVA-STF.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.07138

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SAC-ViT: Semantic-Aware Clustering Vision Transformer with Early Exit

Hu, Youbing, Cheng, Yun, Lu, Anqi, Wei, Dawei, Li, Zhijun

arXiv.org Artificial IntelligenceFeb-26-2025

The Vision Transformer (ViT) excels in global modeling but faces deployment challenges on resource-constrained devices due to the quadratic computational complexity of its attention mechanism. To address this, we propose the Semantic-Aware Clustering Vision Transformer (SAC-ViT), a non-iterative approach to enhance ViT's computational efficiency. SAC-ViT operates in two stages: Early Exit (EE) and Semantic-Aware Clustering (SAC). In the EE stage, downsampled input images are processed to extract global semantic information and generate initial inference results. If these results do not meet the EE termination criteria, the information is clustered into target and non-target tokens. In the SAC stage, target tokens are mapped back to the original image, cropped, and embedded. These target tokens are then combined with reused non-target tokens from the EE stage, and the attention mechanism is applied within each cluster. This two-stage design, with end-to-end optimization, reduces spatial redundancy and enhances computational efficiency, significantly boosting overall ViT performance. Extensive experiments demonstrate the efficacy of SAC-ViT, reducing 62% of the FLOPs of DeiT and achieving 1.98 times throughput without compromising performance.

sac-vit, transformer, vision transformer, (14 more...)

arXiv.org Artificial Intelligence

2503.0006

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback